NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel #28154

AsyaPronina · 2024-12-20T01:55:13Z

Details:

Added parsing of passed NPUW_LLM_PREFILL_CONFIG and NPUW_LLM_GENERATE_CONFIG options
Added parsing of passed NPUW_LLM_PAD_TOKEN_ID

Tickets:

EISW-149349
EISW-149350

Related PRs:

OpenVINO GenAI: Static llm pipeline dynamic shape model openvino.genai#1240

dmatveev · 2024-12-23T11:00:47Z

@TolyaTalamanov please review

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

src/plugins/intel_npu/src/al/include/intel_npu/config/npuw.hpp

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

TolyaTalamanov · 2024-12-24T14:22:53Z

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

+ * Tell NPUW the configuration for compilation of prefill model.
+ * NOTE: !! Write-only !!
+ */
+static constexpr ov::Property<std::string> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};


Wondering why do we even need the Property for this?

They idea is that user may provide it like this:

model = read_model(...); auto compiled = core.compile_model(model, "NPU", { "NPUW_LLM_PREFILL_CONFIG": {...} });

Note, there is no need for user to set or get this config later on. It just should be passed once

It is just for us that all things are in one place

Plus these are also properties, they shouldn't be handled another way, because it will seem as hack. We need unified place to show all properties we have and unified way of handling them.

It is just for us that all things are in one place

TBH, didn't get the point. What are the things and why there should be in one place?

My point is that having llm config params (e.g NPUW_LLM_PREFILL_CONFIG, ...) as properties complicates implementation as it brings more responsibilities to properly handle them. When it's just ov::AnyMap, it's parsed once in llm_compiled_model.cpp and then forgotten.

All properties that are passed into core.compile_model(model, device, properties) should be handled in one place.
It is passed as properties into core.compile_model(...), not as any extra parameters or some custom thing. That means we should treat this as property.

TolyaTalamanov · 2024-12-24T14:23:47Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

 ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
    OPENVINO_SUPPRESS_DEPRECATED_START
+    if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||


I don't believe it's really needed, see comment above

get_property() might be not needed at all here, so as it is a redudant functionality, I suppose to at least handle everything in a unified way here to not create a mess.

Keys provided to LLM pipeline must not be properties, so there won't be any mess

And what they should be in this situation? And how they should be passed into core.compile_model() then?

compile_model accept ov::AnyMap, doesn't it? https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html

GenAI pipeline does it exactly this way

src/plugins/intel_npu/src/plugin/npuw/llm_infer_request.cpp

dmatveev · 2024-12-24T21:33:24Z

@TolyaTalamanov have you finished with review, should this be merged?

@AsyaPronina there are merge conflicts

TolyaTalamanov · 2024-12-27T08:55:19Z

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

+ * Tell NPUW the configuration for compilation of prefill model.
+ * NOTE: !! Write-only !!
+ */
+static constexpr ov::Property<ov::AnyMap> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};


NPUW_LLM_PREFILL_CONFIG and NPUW_LLM_GENERATE_CONFIG are supposed to be passed to compile(...) once and then can be forgotten. Why do we need to define properties for that?

Because we pass them as properties.

Yes, this is the exact question. My point was not to make it as property.

It will only simplify implementation, don't see use case where user would like to query these properties for read/write, is there?

Discussed internally

TolyaTalamanov · 2024-12-27T08:57:13Z

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

@@ -421,6 +429,13 @@ static constexpr ov::Property<uint32_t> min_response_len{"NPUW_LLM_MIN_RESPONSE_
 */
 static constexpr ov::Property<std::string> generate_hint{"NPUW_LLM_GENERATE_HINT"};


Same, I'd not make it as property

Discussed internally

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

TolyaTalamanov · 2024-12-27T09:01:19Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

+    // preserve them somewhere.
+    auto prefill_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_PREFILL_CONFIG"));
+    auto generate_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_GENERATE_CONFIG"));
+
    m_cfg.update(any_copy(npuw_llm_props));


I believe nothing from npuw_llm_props should get into m_cfg, right?

Everything related to LLM pipeline can be extracted here and then forgotten.

That is true, however how would you check and test what have you passed?

The same way as it's done in GenAI. You can put any checks within this file...

Got it! Discussed internally

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

TolyaTalamanov · 2024-12-27T09:05:20Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

 ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
    OPENVINO_SUPPRESS_DEPRECATED_START
+    if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||


Keys provided to LLM pipeline must not be properties, so there won't be any mess

src/plugins/intel_npu/src/plugin/npuw/llm_infer_request.cpp

TolyaTalamanov

Propose not to make LLMCompiledModel config options as property...

The rest of implementation LGTM 👍

TolyaTalamanov · 2024-12-31T12:27:02Z

src/plugins/intel_npu/src/al/include/intel_npu/npuw_private_properties.hpp

+ * Tell NPUW the configuration for compilation of prefill model.
+ * NOTE: !! Write-only !!
+ */
+static constexpr ov::Property<ov::AnyMap> prefill_config{"NPUW_LLM_PREFILL_CONFIG"};


Yes, this is the exact question. My point was not to make it as property.

It will only simplify implementation, don't see use case where user would like to query these properties for read/write, is there?

TolyaTalamanov · 2024-12-31T12:27:56Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

+    // preserve them somewhere.
+    auto prefill_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_PREFILL_CONFIG"));
+    auto generate_config_opt = pop_option(npuw_llm_props, std::string("NPUW_LLM_GENERATE_CONFIG"));
+
    m_cfg.update(any_copy(npuw_llm_props));


The same way as it's done in GenAI. You can put any checks within this file...

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

TolyaTalamanov · 2024-12-31T12:32:55Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

@@ -308,6 +311,11 @@ void ov::npuw::LLMCompiledModel::set_property(const ov::AnyMap& properties) {

 ov::Any ov::npuw::LLMCompiledModel::get_property(const std::string& name) const {
    OPENVINO_SUPPRESS_DEPRECATED_START
+    if (name == ov::intel_npu::npuw::llm::prefill_config.name() ||


compile_model accept ov::AnyMap, doesn't it? https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html

GenAI pipeline does it exactly this way

AsyaPronina requested review from a team as code owners December 20, 2024 01:55

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Dec 20, 2024

AsyaPronina force-pushed the npuw_llm_model_configs branch from e38b474 to 7d88863 Compare December 20, 2024 02:10

dmatveev assigned TolyaTalamanov Dec 23, 2024

AsyaPronina force-pushed the npuw_llm_model_configs branch from 7d88863 to b52da47 Compare December 23, 2024 18:12

TolyaTalamanov reviewed Dec 24, 2024

View reviewed changes

AsyaPronina mentioned this pull request Dec 24, 2024

Static llm pipeline dynamic shape model openvinotoolkit/openvino.genai#1240

Open

TolyaTalamanov reviewed Dec 27, 2024

View reviewed changes

dmatveev added this to the 2025.0 milestone Dec 27, 2024

dmatveev changed the title ~~Added possibility to pass PREFILL/GENERATE configs and pad_token_id~~ NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel Dec 30, 2024

AsyaPronina added 3 commits December 31, 2024 12:23

Added possibility to pass PREFILL/GENERATE configs and pad_token_id

c706912

Fixed clang-format

760642e

Fixed according review comments

e4bd158

TolyaTalamanov approved these changes Dec 31, 2024

View reviewed changes

AsyaPronina force-pushed the npuw_llm_model_configs branch from a263f2c to 5a81b3c Compare December 31, 2024 12:37

Fixed clang-format

0c0b36a

AsyaPronina force-pushed the npuw_llm_model_configs branch from 5a81b3c to 0c0b36a Compare December 31, 2024 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel #28154

NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel #28154

AsyaPronina commented Dec 20, 2024 •

edited

Loading

dmatveev commented Dec 23, 2024

TolyaTalamanov Dec 24, 2024

AsyaPronina Dec 24, 2024

AsyaPronina Dec 24, 2024

TolyaTalamanov Dec 27, 2024

AsyaPronina Dec 31, 2024

TolyaTalamanov Dec 24, 2024

AsyaPronina Dec 24, 2024

TolyaTalamanov Dec 27, 2024

AsyaPronina Dec 31, 2024

TolyaTalamanov Dec 31, 2024

dmatveev commented Dec 24, 2024

TolyaTalamanov Dec 27, 2024

AsyaPronina Dec 31, 2024

TolyaTalamanov Dec 31, 2024

AsyaPronina Dec 31, 2024

TolyaTalamanov Dec 27, 2024 •

edited

Loading

AsyaPronina Dec 31, 2024

TolyaTalamanov Dec 27, 2024

AsyaPronina Dec 31, 2024

TolyaTalamanov Dec 31, 2024

AsyaPronina Dec 31, 2024

TolyaTalamanov Dec 27, 2024

TolyaTalamanov left a comment

TolyaTalamanov Dec 31, 2024

TolyaTalamanov Dec 31, 2024

TolyaTalamanov Dec 31, 2024

		@@ -421,6 +429,13 @@ static constexpr ov::Property<uint32_t> min_response_len{"NPUW_LLM_MIN_RESPONSE_
		*/
		static constexpr ov::Property<std::string> generate_hint{"NPUW_LLM_GENERATE_HINT"};

NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel #28154

Are you sure you want to change the base?

NPUW: Enable PREFILL/GENERATE configs in LLMCompiledModel #28154

Conversation

AsyaPronina commented Dec 20, 2024 • edited Loading

Details:

Tickets:

Related PRs:

dmatveev commented Dec 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmatveev commented Dec 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TolyaTalamanov Dec 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TolyaTalamanov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AsyaPronina commented Dec 20, 2024 •

edited

Loading

TolyaTalamanov Dec 27, 2024 •

edited

Loading